candidate solution
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
Test-Time Scaling (TTS) improves the performance of Large Language Models (LLMs) by using additional inference-time computation to explore multiple reasoning paths through search. Yet how to allocate a fixed rollout budget most effectively during search remains underexplored, often resulting in inefficient use of compute at test time. To bridge this gap, we formulate test-time search as a resource allocation problem and derive the optimal allocation strategy that maximizes the probability of obtaining a correct solution under a fixed rollout budget. Within this formulation, we reveal a core limitation of existing search methods: solution-level allocation tends to favor reasoning directions with more candidates, leading to theoretically suboptimal and inefficient use of compute. To address this, we propose Direction-Oriented Resource Allocation (DORA), a provably optimal method that mitigates this bias by decoupling direction quality from candidate count and allocating resources at the direction level. To demonstrate DORA's effectiveness, we conduct extensive experiments on challenging mathematical reasoning benchmarks including MATH500, AIME2024, and AIME2025. The empirical results show that DORA consistently outperforms strong baselines with comparable computational cost, achieving state-of-the-art accuracy. We hope our findings contribute to a broader understanding of optimal TTS for LLMs.1
Details
A.1 Omitted Proofs (Details for Lemma 3) Clustering Nets Next, we give a detailed proof of Lemma 4. Proof of Lemma 4. Our objective is to generate a small set of cost vectors that satisfy the desired guarantee. We first define the cost vectors (the reader familiar with the proof sketch from the main body of the submission may skip this and the next paragraph). For each subset U of size O(min(α 22i,α 2+ki), we consider the the subspace ΠU spanned by U. In this subspace we consider (α/2i) p cost(p,A)nets of every ball centered around ΠUpwith radius 60 2i/2 p cost(p,A)for all p P. Such a net has size exp(γ rank(U)ilogα), for some constant γ and there exist at most P 0 exp(γ |U|ilogα) Furthermore, there are at most P 0|U| = P |U|0 such subsets. Now, for every point p, define an exponential sequence α2(1 + α/2i)j for j {0,...log102i}. There exist at most P 0 such sequences and every such sequence consists of at most O(α 1 2i i) many values. We combine every net point in ever ball of every subspace with all values in the exponential sequence to obtain the evaluation for a single candidate center.
GEMSS: A Variational Bayesian Method for Discovering Multiple Sparse Solutions in Classification and Regression Problems
Henclová, Kateřina, Šmídl, Václav
Selecting interpretable feature sets in underdetermined ($n \ll p$) and highly correlated regimes constitutes a fundamental challenge in data science, particularly when analyzing physical measurements. In such settings, multiple distinct sparse subsets may explain the response equally well. Identifying these alternatives is crucial for generating domain-specific insights into the underlying mechanisms, yet conventional methods typically isolate a single solution, obscuring the full spectrum of plausible explanations. We present GEMSS (Gaussian Ensemble for Multiple Sparse Solutions), a variational Bayesian framework specifically designed to simultaneously discover multiple, diverse sparse feature combinations. The method employs a structured spike-and-slab prior for sparsity, a mixture of Gaussians to approximate the intractable multimodal posterior, and a Jaccard-based penalty to further control solution diversity. Unlike sequential greedy approaches, GEMSS optimizes the entire ensemble of solutions within a single objective function via stochastic gradient descent. The method is validated on a comprehensive benchmark comprising 128 synthetic experiments across classification and regression tasks. Results demonstrate that GEMSS scales effectively to high-dimensional settings ($p=5000$) with sample size as small as $n = 50$, generalizes seamlessly to continuous targets, handles missing data natively, and exhibits remarkable robustness to class imbalance and Gaussian noise. GEMSS is available as a Python package 'gemss' at PyPI. The full GitHub repository at https://github.com/kat-er-ina/gemss/ also includes a free, easy-to-use application suitable for non-coders.
Appendix APerformanceonreal-worldbasedinstances
We further evaluate SGBS+EAS on nine real-world based instance sets from [15]. Each instance set consists of 20 instances that have similar characteristics (i.e., they have been sampled from the same underlying distribution). To account for this new evaluation setting, we always perform 10 runs in parallel for EAS and SGBS+EAS. This improves the solution quality, while leading only to a slight increase of the requiredruntime. For SGBS+EAS we set (β, γ) = (35,5), the learning rate α = 0.005 and λ = 0.05.